dictionary definition
Do Large Language Models Understand Word Senses?
Meconi, Domenico, Stirpe, Simone, Martelli, Federico, Lavalle, Leonardo, Navigli, Roberto
Understanding the meaning of words in context is a fundamental capability for Large Language Models (LLMs). Despite extensive evaluation efforts, the extent to which LLMs show evidence that they truly grasp word senses remains underexplored. In this paper, we address this gap by evaluating both i) the Word Sense Disambiguation (WSD) capabilities of instruction-tuned LLMs, comparing their performance to state-of-the-art systems specifically designed for the task, and ii) the ability of two top-performing open- and closed-source LLMs to understand word senses in three generative settings: definition generation, free-form explanation, and example generation. Notably, we find that, in the WSD task, leading models such as GPT-4o and DeepSeek-V3 achieve performance on par with specialized WSD systems, while also demonstrating greater robustness across domains and levels of difficulty. In the generation tasks, results reveal that LLMs can explain the meaning of words in context up to 98\% accuracy, with the highest performance observed in the free-form explanation task, which best aligns with their generative capabilities.
Scientists reveal exactly what makes someone a 'badass' - so, do you meet the strict criteria?
If you've always wondered what it takes to be a badass, a new study reveals the strict criteria. Following questionnaires involving over 2,000 people, researchers in the US have officially improved on the dictionary definition of the term. A badass has an'outer toughness' (consisting of physical strength, a'formidable presence', or both), an inner toughness (such as moral resilience and courage), or both. That's why'radically' different men and women โ ranging from peace advocates to fierce warriors โ can be considered badasses, according to the experts. Famous badasses include Genghis Khan (AD 1162 to 1227), the brutal founder of the Mongol Empire responsible for the deaths of around 40 million people, they say.
Word Definitions from Large Language Models
Dictionary definitions are historically the arbitrator of what words mean, but this primacy has come under threat by recent progress in NLP, including word embeddings and generative models like ChatGPT. We present an exploratory study of the degree of alignment between word definitions from classical dictionaries and these newer computational artifacts. Specifically, we compare definitions from three published dictionaries to those generated from variants of ChatGPT. We show that (i) definitions from different traditional dictionaries exhibit more surface form similarity than do model-generated definitions, (ii) that the ChatGPT definitions are highly accurate, comparable to traditional dictionaries, and (iii) ChatGPT-based embedding definitions retain their accuracy even on low frequency words, much better than GloVE and FastText word embeddings.
Multi-Relational Hyperbolic Word Embeddings from Natural Language Definitions
Valentino, Marco, Carvalho, Danilo S., Freitas, Andrรฉ
Neural-based word embeddings using solely distributional information have consistently produced useful meaning representations for downstream tasks. However, existing approaches often result in representations that are hard to interpret and control. Natural language definitions, on the other side, possess a recursive, self-explanatory semantic structure that can support novel representation learning paradigms able to preserve explicit conceptual relations and constraints in the vector space. This paper proposes a neuro-symbolic, multi-relational framework to learn word embeddings exclusively from natural language definitions by jointly mapping defined and defining terms along with their corresponding semantic relations. By automatically extracting the relations from definitions corpora and formalising the learning problem via a translational objective, we specialise the framework in hyperbolic space to capture the hierarchical and multi-resolution structure induced by the definitions. An extensive empirical analysis demonstrates that the framework can help impose the desired structural constraints while preserving the mapping required for controllable and interpretable semantic navigation. Moreover, the experiments reveal the superiority of the hyperbolic word embeddings over the euclidean counterparts and demonstrate that the multi-relational framework can obtain competitive results when compared to state-of-the-art neural approaches (including Transformers), with the advantage of being significantly more efficient and intrinsically interpretable.
Evaluation of Automatically Constructed Word Meaning Explanations
Starรก, Marie, Rychlรฝ, Pavel, Horรกk, Aleลก
Preparing exact and comprehensive word meaning explanations is one of the key steps in the process of monolingual dictionary writing. In standard methodology, the explanations need an expert lexicographer who spends a substantial amount of time checking the consistency between the descriptive text and corpus evidence. In the following text, we present a new tool that derives explanations automatically based on collective information from very large corpora, particularly on word sketches. We also propose a quantitative evaluation of the constructed explanations, concentrating on explanations of nouns. The methodology is to a certain extent language independent; however, the presented verification is limited to Czech and English. We show that the presented approach allows to create explanations that contain data useful for understanding the word meaning in approximately 90% of cases. However, in many cases, the result requires post-editing to remove redundant information.
The Edge of Glory?: Will DABUS 'success' in South Africa and Australia be repeated in the UK? (via Passle)
Lady Gaga sings'I'm on the edge of glory and I'm hanging on a moment of truth'. Until now, the longstanding crusade to allow inventions generated by the AI machine DABUS to be patentable under existing national patent laws across different jurisdictions had not had much success. Lawyers with the "Artificial Inventor Project" had filed patent applications around the world for DABUS' 'inventions' but received a steady stream of rejections from national IP offices and courts (for instance see our Lens posts on refusals by the UKIPO, UK High Court, EPO and USPTO). Surprisingly, DABUS has had better results in recent weeks in respect of its South African and Australian applications. Is this the edge of glory?
Humans and AI: Should We Describe AI as Autonomous?
Beware the hype about AI systems. Although AI is powerful and generates trillions of dollars of economic value across the world, what you see in science fiction movies remains pure fiction. In this blog post, I will focus on the use of the word autonomous, the dangers of using it with stakeholders, and, in the context of customer experience, the inaccurate perception that all things can be automated, eliminating the need for interactions between employees and customers. According to the dictionary, autonomous means "having the freedom to govern itself or control its own affairs." To have autonomy is to have the freedom to exercise self-determination, to rule oneself, to make decisions in accordance with one's own goals, without external interference.
Look It Up: Bilingual and Monolingual Dictionaries Improve Neural Machine Translation
Zhong, Xing Jie, Chiang, David
Despite advances in neural machine translation (NMT) quality, rare words continue to be problematic. For humans, the solution to the rare-word problem has long been dictionaries, but dictionaries cannot be straightforwardly incorporated into NMT. In this paper, we describe a new method for "attaching" dictionary definitions to rare words so that the network can learn the best way to use them. We demonstrate improvements of up to 3.1 BLEU using bilingual dictionaries and up to 0.7 BLEU using monolingual source-language dictionaries.
Multi-channel Reverse Dictionary Model
Zhang, Lei, Qi, Fanchao, Liu, Zhiyuan, Wang, Yasheng, Liu, Qun, Sun, Maosong
A reverse dictionary takes the description of a target word as input and outputs the target word together with other words that match the description. Inspired by the description-to-word inference process of humans, we propose the multi-channel reverse dictionary model, which can mitigate the two problems simultaneously. Our model comprises a sentence encoder and multiple predictors. The predictors are expected to identify different characteristics of the target word from the input query. We evaluate our model on English and Chinese datasets including both dictionary definitions and human-written descriptions. Experimental results show that our model achieves the state-of-the-art performance, and even outperforms the most popular commercial reverse dictionary system on the human-written description dataset. We also conduct quantitative analyses and a case study to demonstrate the effectiveness and robustness of our model. All the code and data of this work can be obtained on https://github.com/thunlp/MultiRD. Introduction A regular (forward) dictionary maps words to definitions while a reverse dictionary (Sierra 2000) does the opposite and maps descriptions to corresponding words. In Figure 1, for example, a regular dictionary tells you that "expressway" is "a wide road that allows traffic to travel fast", and when you input "a road where cars go very quickly without stopping" to a reverse dictionary, it might return "expressway" together with other semantically similar words like "freeway". Reverse dictionaries have great practical value.
Manipulating Word Representations, and Preparing Students for Coding Jobs?
Recent research in natural language processing using the program word2vec gives manipulations of word representations that look a lot like semantics produced by vector math. For vector calculations to produce semantics would be remarkable, indeed. The word vectors are drawn from context, big, huge context. And, at least roughly, the meaning of a word is its use (in context). Is it possible some question is begged here?